138 research outputs found

    Whole-genome sequence analysis for pathogen detection and diagnostics

    Get PDF
    This dissertation focuses on computational methods for improving the accuracy of commonly used nucleic acid tests for pathogen detection and diagnostics. Three specific biomolecular techniques are addressed: polymerase chain reaction, microarray comparative genomic hybridization, and whole-genome sequencing. These methods are potentially the future of diagnostics, but each requires sophisticated computational design or analysis to operate effectively. This dissertation presents novel computational methods that unlock the potential of these diagnostics by efficiently analyzing whole-genome DNA sequences. Improvements in the accuracy and resolution of each of these diagnostic tests promises more effective diagnosis of illness and rapid detection of pathogens in the environment. For designing real-time detection assays, an efficient data structure and search algorithm are presented to identify the most distinguishing sequences of a pathogen that are absent from all other sequenced genomes. Results are presented that show these "signature" sequences can be used to detect pathogens in complex samples and differentiate them from their non-pathogenic, phylogenetic near neighbors. For microarray, novel pan-genomic design and analysis methods are presented for the characterization of unknown microbial isolates. To demonstrate the effectiveness of these methods, pan-genomic arrays are applied to the study of multiple strains of the foodborne pathogen, Listeria monocytogenes, revealing new insights into the diversity and evolution of the species. Finally, multiple methods are presented for the validation of whole-genome sequence assemblies, which are capable of identifying assembly errors in even finished genomes. These validated assemblies provide the ultimate nucleic acid diagnostic, revealing the entire sequence of a genome

    Genome assembly forensics: finding the elusive mis-assembly

    Get PDF
    A collection of software tools is combined for the first time in an automated pipeline for detecting large-scale genome assembly errors and for validating genome assemblies

    The rise of a digital immune system

    Get PDF
    Driven by million-fold improvements in biotechnology, biology is increasingly shifting towards high-resolution, quantitative approaches to study the molecular dynamics of entire populations. One exciting application enabled by this new era of biology is the “digital immune system”. It would work in much the same way as an adaptive, biological immune system: by observing the microbial landscape, detecting potential threats, and neutralizing them before they spread beyond control. With the potential to have an enormous impact on public health, it is time to integrate the necessary biotechnology, computational, and organizational systems to seed the development of a global, sequencing-based pathogen surveillance system.https://doi.org/10.1186/2047-217X-1-

    The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation.

    Get PDF
    Natural killer (NK) cells are a diverse population of lymphocytes with a range of biological roles including essential immune functions. NK cell diversity is in part created by the differential expression of cell surface receptors which modulate activation and function, including multiple subfamilies of C-type lectin receptors encoded within the NK complex (NKC). Little is known about the gene content of the NKC beyond rodent and primate lineages, other than it appears to be extremely variable between mammalian groups. We compared the NKC structure between mammalian species using new high-quality draft genome assemblies for cattle and goat; re-annotated sheep, pig, and horse genome assemblies; and the published human, rat, and mouse lemur NKC. The major NKC genes are largely in the equivalent positions in all eight species, with significant independent expansions and deletions between species, allowing us to propose a model for NKC evolution during mammalian radiation. The ruminant species, cattle and goats, have independently evolved a second KLRC locus flanked by KLRA and KLRJ, and a novel KLRH-like gene has acquired an activating tail. This novel gene has duplicated several times within cattle, while other activating receptor genes have been selectively disrupted. Targeted genome enrichment in cattle identified varying levels of allelic polymorphism between the NKC genes concentrated in the predicted extracellular ligand-binding domains. This novel recombination and allelic polymorphism is consistent with NKC evolution under balancing selection, suggesting that this diversity influences individual immune responses and may impact on differential outcomes of pathogen infection and vaccination

    Versatile and open software for comparing large genomes

    Get PDF
    The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at

    Probing the pan-genome of Listeria monocytogenes: new insights into intraspecific niche expansion and genomic diversification

    Get PDF
    Bacterial pathogens often show significant intraspecific variations in ecological fitness, host preference and pathogenic potential to cause infectious disease. The species of Listeria monocytogenes, a facultative intracellular pathogen and the causative agent of human listeriosis, consists of at least three distinct genetic lineages. Two of these lineages predominantly cause human sporadic and epidemic infections, whereas the third lineage has never been implicated in human disease outbreaks despite its overall conservation of many known virulence factors. Here we compare the genomes of 26 L. monocytogenes strains representing the three lineages based on both in silico comparative genomic analysis and high-density, pan-genomic DNA array hybridizations. We uncover 86 genes and 8 small regulatory RNAs that likely make L. monocytogenes lineages differ in carbohydrate utilization and stress resistance during their residence in natural habitats and passage through the host gastrointestinal tract. We also identify 2,330 to 2,456 core genes that define this species along with an open pan-genome pool that contains more than 4,052 genes. Phylogenomic reconstructions based on 3,560 homologous groups allowed robust estimation of phylogenetic relatedness among L. monocytogenes strains. Our pan-genome approach enables accurate co-analysis of DNA sequence and hybridization array data for both core gene estimation and phylogenomics. Application of our method to the pan-genome of L. monocytogenes sheds new insights into the intraspecific niche expansion and evolution of this important foodborne pathogen.https://doi.org/10.1186/1471-2164-11-50

    Automated ensemble assembly and validation of microbial genomes

    Get PDF
    The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.https://doi.org/10.1186/1471-2105-15-12

    Complex Microbiome Underlying Secondary and Primary Metabolism in the Tunicate-\u3cem\u3eProchloron\u3c/em\u3e Symbiosis

    Get PDF
    The relationship between tunicates and the uncultivated cyanobacterium Prochloron didemni has long provided a model symbiosis. P. didemni is required for survival of animals such as Lissoclinum patella and also makes secondary metabolites of pharmaceutical interest. Here, we present the metagenomes, chemistry, and microbiomes of four related L. patella tunicate samples from a wide geographical range of the tropical Pacific. The remarkably similar P. didemni genomes are the most complex so far assembled from uncultivated organisms. Although P. didemni has not been stably cultivated and comprises a single strain in each sample, a complete set of metabolic genes indicates that the bacteria are likely capable of reproducing outside the host. The sequences reveal notable peculiarities of the photosynthetic apparatus and explain the basis of nutrient exchange underlying the symbiosis. P. didemni likely profoundly influences the lipid composition of the animals by synthesizing sterols and an unusual lipid with biofuel potential. In addition, L. patella also harbors a great variety of other bacterial groups that contribute nutritional and secondary metabolic products to the symbiosis. These bacteria possess an enormous genetic potential to synthesize new secondary metabolites. For example, an antitumor candidate molecule, patellazole, is not encoded in the genome of Prochloron and was linked to other bacteria from the microbiome. This study unveils the complex L. patella microbiome and its impact on primary and secondary metabolism, revealing a remarkable versatility in creating and exchanging small molecules

    Improved reference genome of the arboviral vector Aedes albopictus

    Get PDF
    Background: The Asian tiger mosquito Aedes albopictus is globally expanding and has become the main vector for human arboviruses in Europe. With limited antiviral drugs and vaccines available, vector control is the primary approach to prevent mosquito-borne diseases. A reliable and accurate DNA sequence of the Ae. albopictus genome is essential to develop new approaches that involve genetic manipulation of mosquitoes. Results: We use long-read sequencing methods and modern scaffolding techniques (PacBio, 10X, and Hi-C) to produce AalbF2, a dramatically improved assembly of the Ae. albopictus genome. AalbF2 reveals widespread viral insertions, novel microRNAs and piRNA clusters, the sex-determining locus, and new immunity genes, and enables genome-wide studies of geographically diverse Ae. albopictus populations and analyses of the developmental and stage-dependent network of expression data. Additionally, we build the first physical map for this species with 75% of the assembled genome anchored to the chromosomes. Conclusion: The AalbF2 genome assembly represents the most up-to-date collective knowledge of the Ae. albopictus genome. These resources represent a foundation to improve understanding of the adaptation potential and the epidemiological relevance of this species and foster the development of innovative control measures
    corecore